dormant ratio
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.05)
- Africa > Rwanda > Kigali > Kigali (0.04)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- (8 more...)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.05)
- Africa > Rwanda > Kigali > Kigali (0.04)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- (8 more...)
Generalizing Consistency Policy to Visual RL with Prioritized Proximal Experience Regularization
Li, Haoran, Jiang, Zhennan, Chen, Yuhui, Zhao, Dongbin
With high-dimensional state spaces, visual reinforcement learning (RL) faces significant challenges in exploitation and exploration, resulting in low sample efficiency and training stability. As a time-efficient diffusion model, although consistency models have been validated in online state-based RL, it is still an open question whether it can be extended to visual RL. In this paper, we investigate the impact of non-stationary distribution and the actor-critic framework on consistency policy in online RL, and find that consistency policy was unstable during the training, especially in visual RL with the high-dimensional state space. To this end, we suggest sample-based entropy regularization to stabilize the policy training, and propose a consistency policy with prioritized proximal experience regularization (CP3ER) to improve sample efficiency. CP3ER achieves new state-of-the-art (SOTA) performance in 21 tasks across DeepMind control suite and Meta-world. To our knowledge, CP3ER is the first method to apply diffusion/consistency models to visual RL and demonstrates the potential of consistency models in visual RL. More visualization results are available at https://jzndd.github.io/CP3ER-Page/.
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.05)
- Africa > Rwanda > Kigali > Kigali (0.05)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- (8 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.92)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.36)
Pretrained Visual Representations in Reinforcement Learning
Williams, Emlyn, Polydoros, Athanasios
Visual reinforcement learning (RL) has made significant progress in recent years, but the choice of visual feature extractor remains a crucial design decision. This paper compares the performance of RL algorithms that train a convolutional neural network (CNN) from scratch with those that utilize pre-trained visual representations (PVRs). We evaluate the Dormant Ratio Minimization (DRM) algorithm, a state-of-the-art visual RL method, against three PVRs: ResNet18, DINOv2, and Visual Cortex (VC). We use the Metaworld Push-v2 and Drawer-Open-v2 tasks for our comparison. Our results show that the choice of training from scratch compared to using PVRs for maximising performance is task-dependent, but PVRs offer advantages in terms of reduced replay buffer size and faster training times. We also identify a strong correlation between the dormant ratio and model performance, highlighting the importance of exploration in visual RL. Our study provides insights into the trade-offs between training from scratch and using PVRs, informing the design of future visual RL algorithms.
- Europe > United Kingdom > England > Lincolnshire > Lincoln (0.04)
- Asia > South Korea > Daegu > Daegu (0.04)
DrM: Mastering Visual Reinforcement Learning through Dormant Ratio Minimization
Xu, Guowei, Zheng, Ruijie, Liang, Yongyuan, Wang, Xiyao, Yuan, Zhecheng, Ji, Tianying, Luo, Yu, Liu, Xiaoyu, Yuan, Jiaxin, Hua, Pu, Li, Shuzhen, Ze, Yanjie, Daumé, Hal III, Huang, Furong, Xu, Huazhe
Visual reinforcement learning (RL) has shown promise in continuous control tasks. Despite its progress, current algorithms are still unsatisfactory in virtually every aspect of the performance such as sample efficiency, asymptotic performance, and their robustness to the choice of random seeds. In this paper, we identify a major shortcoming in existing visual RL methods that is the agents often exhibit sustained inactivity during early training, thereby limiting their ability to explore effectively. Expanding upon this crucial observation, we additionally unveil a significant correlation between the agents' inclination towards motorically inactive exploration and the absence of neuronal activity within their policy networks. To quantify this inactivity, we adopt dormant ratio as a metric to measure inactivity in the RL agent's network. Empirically, we also recognize that the dormant ratio can act as a standalone indicator of an agent's activity level, regardless of the received reward signals. Leveraging the aforementioned insights, we introduce DrM, a method that uses three core mechanisms to guide agents' exploration-exploitation trade-offs by actively minimizing the dormant ratio. Experiments demonstrate that DrM achieves significant improvements in sample efficiency and asymptotic performance with no broken seeds (76 seeds in total) across three continuous control benchmark environments, including DeepMind Control Suite, MetaWorld, and Adroit. Most importantly, DrM is the first model-free algorithm that consistently solves tasks in both the Dog and Manipulator domains from the DeepMind Control Suite as well as three dexterous hand manipulation tasks without demonstrations in Adroit, all based on pixel observations.
- Asia (0.14)
- North America > United States > Maryland (0.14)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)